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Abstract 

The effects of error propagation in the reproduction of diploid organ- 
isms are studied within the population genetics framework of the quasis- 
pecies model. The dependence of the error threshold on the dominance 
parameter is fully investigated. In particular, it is shown that dominance 
can protect the wild-type alleles from the error catastrophe. The analysis 
is restricted to a diploid analogue of the single-peaked fitness landscape. 
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The finding that the length of self-reproducing molecules that compete for 
a finite supply of resources is limited by their replication accuracy is probably 
the main outcome of Eigen's quasispecies model (Eigen 1971). This phenomenon, 
termed error threshold, poses an interesting challenge to the theories of the origin 
of life, since it prevents the emergence of huge molecules which could carry the 
necessary information for building a complex metabolism (Eigen and Schuster 
1979, Kauffman 1993). 

In the quasispecies model, a molecule is represented by a string of v digits 
s = (si, S2, ■ ■ ■ , s u ), with the variables s« allowed to take on k different values, each 
representing a different type of monomer used to build the molecule. The focus is 
on the time evolution of the concentrations Xj of molecules of type % = 1, 2, . . . , k v 
which obey the following differential equations (Eigen 1971) 

dx ■ 

-l = ^ Wij x j -[D i + <Z>(t)]x i , (1) 

where the constants Di stand for the death probability of molecules of type i 
and $(£) is a dilution flux that keeps the total concentration constant. The fea- 
ture that distinguishes this model from the well-established models of population 
genetics (Hartl and Clark 1989) is the replication matrix W which takes into 
account the primary structure of the molecules. More specifically, its elements 
are given by 

W« = Ai(f (2) 

and 

^ = {k _^ )d(id) <r<™ (i - *) d(i ' j) Mj, 0) 

where Ai is the replication rate of molecules of type i, d (i, j) is the Hamming dis- 
tance between molecules i and j, and g 6 [0, 1] is the single- monomer replication 
accuracy, which is assumed to be the same for all monomers. 

For simple replication landscapes, the solutions of the k v kinetic equations ([!]) 
have been thoroughly studied using perturbation theory (Eigen et al 1989). More 
complex, spin-glass-like replication landscapes can be analysed using the corre- 
spondence between those equations and the equilibrium properties of a surface 
lattice system (Leuthauser 1986, Leuthauser 1987, Tarazona 1992, Franz et al 
1993). It is worth to mention that for the single-peaked replication landscape the 
exact stationary solution of equations (0) can be obtained by mapping them into 
a polymer localization problem (Galluccio et al 1996). Recently, a population 
genetics approach to the quasispecies model has been proposed that, in spite of 
its simplicity, yields results that are qualitatively similar to those obtained by 
solving the kinetic equations (Alves and Fontanari 1996). 

An alternative interpretation of the quasispecies model is given by considering 
the k v different strings s as different forms (alleles) of a certain gene that deter- 
mines the fitness Ai of the haploid organisms. Thus, this model is equivalent to 
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the classical one-locus, multiple-allele model of population genetics (Hartl and 
Clark 1989), except for the mutation mechanism which must be adapted to sat- 
isfy the constraints imposed by the internal structure of the alleles. Accordingly, 
Wiehe et al (1995) have generalized the original haploid formulation of the qua- 
sispecies model so as to consider the evolution of diploid organisms as well. An 
important by-product of that analysis is the study of the effects of dominance on 
error thresholds, which has led to an interesting conjecture about the evolution 
of dominance. 

In this paper we employ the population genetics formulation of the quasis- 
pecies model to investigate the error propagation in the reproduction of diploid 
organisms. This approach allows us to study in great detail the dynamical be- 
havior of the model in the full space of the control parameters v and q as well 
as in the space of the parameters that specify the fitness landscapes. We should 
mention that the analysis of Wiehe et al (1995) was based on the numerical so- 
lution of the diploid counterpart of the kinetic equations ([[]) and on very crude 
approximations that neglect the effects of back mutations. 

In the population genetics formulation, the k v different alleles are grouped 
into (u + k — l)\/v\ (k — 1)\ classes, according to the number of monomers of 
each type they have, regardless their specific position inside the allele. Hence, a 
given class is characterized by the vector P = (Pi, P 2 , . . . , P K ), where P a is the 
number of monomers of type a in any allele inside that class. Clearly, J2 a Pa = v. 
The alleles belonging to the same class are assumed to be equivalent, in the sense 
that their presence confers the same fitness value on the genotypes. The crucial 
simplifying assumption of the population genetics approach is that, given the 
monomer frequencies in generation t, p a (t) with J2aPa(t) = 1, the frequencies of 
alleles in class P are given by the multinomial distribution 

n,(p) = c P [ Pl {t)} Pl M*)] ft ■ ■ ■ [p K (t)f K (4) 

where Cp = v\j PJP2! . . . PJ. Thus, at generation t, the monomers are sampled 
with replacement from an urn containing n different types of monomers in the 
proportions p a (t); a — 1, . . . , K. 

Let A(P\P j ) = A{P j ) P i ) denote the fitness of the genotypes P^P 7 ', i.e., 
genotypes composed of any pair of alleles belonging to classes P l and PK Then 
the fraction of monomers a that the genotype P*P J contributes to generation t+1 
is proportional to the product of three factors: (a) its frequency in the population 
Ut(P l , P^), (b) its fitness A{P t 1 P^) 1 and (c) the average number of monomers a 
that replicate correctly, q(P^ + P^), plus the average number of monomers (3 ^ a 
that mutate to a, [(1 — q)/(n — 1)] J^p^aiPp + Pa)- A simple calculation yields 
the following equations for the time evolution of the monomer frequencies: 



Pa{t+l) 



q + ^-^ £ £ U t (P\ Pi)A{P\ Pi){Pl + Pi) 
1 pi P3 



(5) 
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where 

Tl t (P\n = Tl t (P i )Tl t (P>) (6) 
and the normalization factor, 

Wt = uY; x; n 4 (p\ P), (7) 

is the average fitness of the entire population. Here the notation J2p stands for 
Sp 1= o • • • Ep fi =o $ { v i Y^a Pa), where S(k, I) is the Kronecker delta. It is interesting 
to note that equation (|) is identical to the equation governing the evolution of 
sexually reproducing haploid organisms (Alves and Fontanari 1996). 

In the remainder of this paper we will consider binary strings only. In this case 
there are two types of monomers (« = 2), so that the alleles are characterized by a 
single parameter, namely, the number of monomers of type 1 they have, Pi = P. 
The extension of our analysis to larger values of k is straightforward. To proceed 
further we must specify the fitness of the genotypes P % PK According to Wiehe et 
al (1995) we consider the following diploid analogue of the single-peaked fitness 
landscape: 



A{P\P j 



1 + a) 2 if P i = pi = v 



(1 + a) 2h if P { = v and P j ^ v (8) 
1 if P i ^ v and P j ^ v, 

where a > is the parameter measuring the selective advantage of the so-called 
master allele P = u, and — oo < h < oo is the dominance parameter. The master 
allele is completely dominant for h = 1 and completely recessive for h = 0. For 
h = 1/2 we find A(P\Pi) = A(P l )A(Pi) and so there is no dominance. In this 
case equation (|[) reduces to the equation that governs the evolution of asexually 
reproducing haploid organisms (Alves and Fontanari 1996). Thus the intervals 
h G [0, 1/2) and h G (1/2, 1] delimitate the regions of recessivity and dominance, 
respectively, of the master allele. There are other cases of interest as well: h > 1 
models the phenomenon of heterosis or hybrid vigor (heterozygote advantage), 
while h < models the phenomenon that occurs at the early stages of speciation 
when hybrids are less viable (heterozygote disadvantage). 

Inserting equation (|j) into the recurrence equation (|) yields the following 
equation for the frequency of monomers of type 1 in generation t, pi(t) = pt, 



K l P f + K 2 (p t + l)p u t + Pt 
k x pf + 2A 2 pr + 1 



Pt+1 = i - q + ( 2ff - 1) ^j^si'?:" * o) 



where 
and 



Ax = (1 + a) 2 -2(l + a) 2h + l (10) 



A 2 = (l + a) 2?i -l. (11) 
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In figure 1 we present the steady-state frequencies of alleles, obtained by 
solving the recursion equation @ with p « 1, as a function of the error rate 
per monomer 1 — q for v — 10, a — 2, and different values of the dominance 
parameter. In the case of perfect replication accuracy (1 — q = 0), the fixed point 
p* = is always unstable, while p* = 1 is stable for h < 1 only. For h > 1, 
a third (stable) fixed point 1/2 < p* pa 1 appears signalling the emergence of 
heterosis. For h < h c fh 1.75 there are two distinct regimes: the quasispecies 
regime characterized by a population dominated by the master allele and its 
close neighbors, and the uniform regime where the 2 U alleles appear in the same 
proportion (clearly the class P = vj2 is the most favored in this case). The 
error rate at which the discontinuous transition between these two regimes takes 
place is termed error threshold 1 — q t . As h increases, the size of the jump at 
the transition decreases till it disappears at a critical value h = h c . Beyond that 
value it is no longer possible to distinguish the two regimes. 

To better characterize the error threshold transition we concentrate our anal- 
ysis on the nature of the fixed points p t +\ = Pt = P* which are given by the real 
roots of f(p) = 0, where 

/ (p) =A 1 (p- q)p 2v + A 2 (3p - 2pq - l)p v - (1 - q) (1 - 2p) . (12) 

For small error rates this equation has only one real root which corresponds 
to the stable fixed point p* « 1 associated to the quasispecies regime. As the 
error rate increases, a double root appears originating two new fixed points: a 
stable one, p* ~ 1/2, associated to the uniform regime, and an unstable one 
that delimitates the basins of attraction of the stable fixed points. These fixed 
points co-exist till the error rate reaches the threshold value 1 — q t , where the 
stable quasispecies fixed point and the unstable one coalesce. For larger error 
rates, equation ( |i2| ) has only one real root which corresponds to the uniform 
fixed point. Thus, the error threshold transition can be easily determined by 
solving f(p) = df(p)/dp = simultaneously for p and q = q t . As mentioned 
above, since these equations have two solutions we must choose the one with the 
larger value of p. The critical point h c is determined by tuning the value of h so 
that the three real roots of (|T2|) coincide, i.e., we have to solve the three equations 
f(p) = df{p)/dp = d 2 f(p)/dp 2 = simultaneously for p, q = q c and h = h c . 

Using the prescriptions given above, we present in figure 2 the error threshold 
transition lines as a function of h for v = 10 and several values of a. The 
error threshold 1 — qt is practically insensitive to variations of h for negative 
and small positive values of this parameter. It reaches its minimal value around 
h = 0.5 (non-dominance regime) and then increases quickly as the system enters 
the dominance region, h > 0.5. We note the reentrant behavior of these transition 
lines: for certain values of q, the system undergoes two discontinuous transitions 
as h is increased. The transition lines end at critical points, which are shown in 
figure 3 for different values of v. It is interesting to note that only for v < 7 (or 
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more exactly v < 6.93) the critical lines touch the axis 1 — q c = 0. So, in these 
cases, there are values of the dominance parameter h > for which the error 
threshold transition never occurs. 

Heretofore we have concentrated on the location of the error threshold as 
a function of the dominance parameter. We turn now to the analysis of the 
composition of the population at the steady state. It can be characterized by the 
average normalized Hamming distance from the master allele which, within the 
population genetics framework, is given simply by 1—p*. This quantity is shown 
in figure 4 as a function of the error rate for v = 10, a — 0.5 and several values of 
h. What is remarkable about this figure is that there exists a value of the error 
rate 1—q = l—q r ~ 0.017 such that the fixed point p* ~ 0.912 is independent of h. 
This fixed point, however, becomes unstable for h < h s w 0.244. Thus, although 
recessivity leads to a higher concentration of the master allele for 1 — q < 1 — q r , 
this allele is quickly lost from the population for larger error rates. The main 
effect of dominance is to postpone the error catastrophe at the price of reducing 
the concentration of the master allele in the population. We note that at the 
inflection point q = q r the effects of dominance and recessivity are reversed. 
This point can be easily determined by setting to zero the coefficient of the term 
(1 + a) 2h in equation ([12]), namely, 

g(p) = -2 (p - q) p v + (S- 2q) p-1, (13) 

and solving g(p) = together with f(p) =0 for p and q = q r . 

Both analyses, the location of the error threshold and the composition of 
the population, indicate that dominance allows the master allele to resist to 
higher replication error rates than in the case of non-dominance. Actually, for 
sufficiently large h, it can even avoid the error catastrophe. This finding has 
been proposed as a possible explanation for the fact that the wild type, i.e., the 
allele that predominates in a population and that is particularly well suited to 
its environment, is often dominant: the dominant alleles might be the prevailing 
wild-type ones simply because they can tolerate higher error rates (Wiehe et al 
1995). 

In summary, we have employed the population genetics approach to the qua- 
sispecies model to investigate the error threshold catastrophe in the evolution of 
diploid organisms. In order to enhance the non-trivial effects of the imperfect 
replication accuracy of the organisms on the population composition, we have 
focused on a simple diploid analogue of the single-peaked fitness landscape. Two 
distinct steady-state regimes are observed: the quasispecies regime where the in- 
formation about the environment, modelled by the fitness landscape, is preserved 
in the population composition, and the uniform regime, where this information 
is irreversibly lost. In the space of the parameters 1 — q and h, these regimes 
are separated by discontinuous transitions lines that terminate at critical points, 
beyond which they become indistinguishable. We have found that dominance 
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(h > 0.5) can postpone or even avoid the error catastrophe. It is interesting, 
however, that a recessive allele (h < 0.5) can do better than a non-dominant one 
(h « 0.5). 

To conclude, we mention that our results are in qualitative agreement with 
those of Wiehe et al (1995). Since the population genetics approach of the qua- 
sispecies model incorporates only a few essential features of the original chemical 
kinetics formulation, this agreement gives a strong evidence for the robustness 
of the main conclusion drawn from the model, namely, the existence of an error 
catastrophe that limits the size of self-replicating organisms. Rather than just 
a caricature of the original model, the population genetics model presented in 
this paper may be viewed as a simpler, alternative model for investigating the 
evolution of self-replicating organisms, which may greatly facilitate the analysis 
of difficult problems such as the error propagation in finite populations and the 
effects of cooperation or catalysis among the evolving organisms. 
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Figure captions 



Fig. 1 Steady-state frequencies of alleles belonging to classes P = 10 (master 
allele) to P = as a function of the error rate per digit 1 — q for v — 10, a — 2, 
and (a) h = 0,(b)h = 1, (c) h = 1.5, and (d) h = 2. 

Fig. 2 Error threshold 1 function of the dominance parameter h for 

v = 10 and (from top to bottom) a = 314.8, 186.1, 57.0, 18.8, 4.4, 2.0, and 1.3. 
The parameter a was chosen so that the transition lines end at critical points 
located at h — 0, 0.4, 0.5, 0.6, 1.0, 1.5 and 2.0, respectively. 

Fig. 3 Error threshold at the critical point 1 — q c as a function of the dominance 
parameter h for (from top to bottom) v — 10 to v — 2. 

Fig. 4 Average normalized Hamming distance from master as a function of the 
replication error rate for v — 10, a — 0.5, and (from top to bottom before the 
intersection) h = 2, 1.75, 1.5, 1.25, 1, 0.75, 0.5, 0.25, and 0. The curves for 
h > h s = 0.244 intersect at the inflection point 1 — q r = 0.017. 
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